Skip to content

Optimize BuildFQName function #1665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 1, 2024
Merged

Conversation

jkroepke
Copy link
Member

The BuildFQName function is used on many exporters multiple times.

The overall benefit might be quite low, but the I'm happy to contribute an optimization.

strings.Builder is available since go 1.10 and should me the current requirements.

% benchstat BenchmarkBuildFQName1.txt BenchmarkBuildFQName2.txt                                             
goos: darwin
goarch: amd64
pkg: github.com/prometheus/client_golang/prometheus
cpu: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
               │ BenchmarkBuildFQName1.txt │      BenchmarkBuildFQName2.txt      │
               │          sec/op           │   sec/op     vs base                │
BuildFQName1-8                 60.43n ± 0%   45.05n ± 0%  -25.44% (p=0.000 n=20)
BuildFQName2-8                 2.310n ± 1%   2.333n ± 1%   +1.00% (p=0.038 n=20)
BuildFQName3-8                 71.14n ± 1%   50.13n ± 0%  -29.53% (p=0.000 n=20)
geomean                        21.49n        17.40n       -19.04%

Go test commands:

go test -bench="BenchmarkBuildFQName" -run=^BenchmarkBuildFQName -count=20 | tee BenchmarkBuildFQName1.txt

Benchmark File:

func BenchmarkBuildFQName1(b *testing.B) {
	for i := 0; i < b.N; i++ {
		prometheus.BuildFQName("node", "", "usage_seconds_total")
	}
}

func BenchmarkBuildFQName2(b *testing.B) {
	for i := 0; i < b.N; i++ {
		prometheus.BuildFQName("node", "cpu", "")
	}
}

func BenchmarkBuildFQName3(b *testing.B) {
	for i := 0; i < b.N; i++ {
		prometheus.BuildFQName("node", "cpu", "usage_seconds_total")
	}
}

Signed-off-by: Jan-Otto Kröpke <[email protected]>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful, thanks!

@bwplotka bwplotka merged commit d3f2840 into prometheus:main Nov 1, 2024
10 checks passed
@jkroepke jkroepke deleted the BuildFQName branch November 1, 2024 20:00
@dswarbrick
Copy link
Contributor

dswarbrick commented Jan 21, 2025

I'm skeptical that the extra overhead of using strings.Builder makes sense for such a short string. Indeed, the strings.Builder approach is barely any different here. But if we're going for optimization, sometimes the simplest and most explicit code is the fastest, to let the compiler optimize it more efficiently. A simple string concatenation is about twice as fast as the strings.Builder approach.

func BuildFQNameConcat(namespace, subsystem, name string) string {
    if name == "" {
        return ""
    }   

    switch {
    case namespace != "" && subsystem != "": 
        return namespace + "_" + subsystem + "_" + name
    case namespace != "": 
        return namespace + "_" + name
    case subsystem != "": 
        return subsystem + "_" + name
    }   

    return name
}

Benchmarking all three approaches:

$ go test -bench="BenchmarkBuildFQName"
goos: linux
goarch: amd64
pkg: foo
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
BenchmarkBuildFQNameOrig1-16            35745230                37.27 ns/op
BenchmarkBuildFQNameOrig2-16            831726489                1.377 ns/op
BenchmarkBuildFQNameOrig3-16            32682994                46.34 ns/op
BenchmarkBuildFQNameKroepke1-16         38300875                40.45 ns/op
BenchmarkBuildFQNameKroepke2-16         862387366                1.433 ns/op
BenchmarkBuildFQNameKroepke3-16         41427100                47.05 ns/op
BenchmarkBuildFQNameConcat1-16          77409106                15.20 ns/op
BenchmarkBuildFQNameConcat2-16          1000000000               0.2183 ns/op
BenchmarkBuildFQNameConcat3-16          54762304                24.28 ns/op

Just for the record, the second benchmark of each implementation is moot, since that passes an empty name to the function, which will return very quickly due to:

    if name == "" {
        return ""
    }

@jkroepke
Copy link
Member Author

No clue, why my contribution is even slower as the original one. The only different I saw is that I used -count=20 as additional parameter to get and average result of multiple runs.

As I know, the performance of string concat depends on memory speed, since each string concat operation copy memory. The compiler optimization is limited on that region.
string.Builder it self is described a "build string efficiently" by minimizes memory copying. That what the go team advertise. Not sure, how much overhead has string.Builder. It offers an byte slice with will converted to a string. string concat does similar things.

source code:

It could be possible that your memory speed on your 11th intel cpu is much higher than on my old 4th gen which result into much better results on string concats.

Just for the record, the second benchmark of each implementation is moot, since that passes an empty name to the function, which will return very quickly due to:

If there is short-path, why the times are so different? It make no sense to me.

BenchmarkBuildFQNameOrig2-16            831726489                1.377 ns/op
BenchmarkBuildFQNameKroepke2-16         862387366                1.433 ns/op
BenchmarkBuildFQNameConcat2-16          1000000000               0.2183 ns/op

@dswarbrick
Copy link
Contributor

I've used strings.Builder before, and it certainly has its uses. The fact that it's internally using a backing byte slice and calling append() effectively makes allocations (and copies) exponentially less frequent, the larger the string gets. But that's significant overkill for this use case, where the combined length of namespace, subsystem and name is probably no more than ~100 bytes max.

I think the reason why the simple string concatenation approach is significantly faster is that the compiler is inlining that code. If I force it to not inline with //go:noinline, then the results look more or less the same as the other two approaches:

BenchmarkBuildFQNameOrig1-16            36150464                36.90 ns/op
BenchmarkBuildFQNameOrig2-16            848673825                1.460 ns/op
BenchmarkBuildFQNameOrig3-16            31585210                37.74 ns/op
BenchmarkBuildFQNameKroepke1-16         38613002                26.72 ns/op
BenchmarkBuildFQNameKroepke2-16         798602870                1.448 ns/op
BenchmarkBuildFQNameKroepke3-16         31035693                34.34 ns/op
BenchmarkBuildFQNameConcat1-16          38535470                30.34 ns/op
BenchmarkBuildFQNameConcat2-16          766743506                1.686 ns/op
BenchmarkBuildFQNameConcat3-16          26398447                44.38 ns/op

The inlining of that function explains why the empty-name fast path is ... faster.

Since you suspect that it is dependent on CPU generation and memory speed, I also ran the benchmark on a quite old Intel Core i3 (without the noinline pragma)

$ go test -bench="BenchmarkBuildFQName"
goos: linux
goarch: amd64
pkg: foo
cpu: Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
BenchmarkBuildFQNameOrig1-4             13225622                86.58 ns/op
BenchmarkBuildFQNameOrig2-4             365059735                3.282 ns/op
BenchmarkBuildFQNameOrig3-4             11768422               100.4 ns/op
BenchmarkBuildFQNameKroepke1-4          18770256                62.09 ns/op
BenchmarkBuildFQNameKroepke2-4          363927945                3.279 ns/op
BenchmarkBuildFQNameKroepke3-4          16410830                72.53 ns/op
BenchmarkBuildFQNameConcat1-4           32128861                36.78 ns/op
BenchmarkBuildFQNameConcat2-4           1000000000               0.4373 ns/op
BenchmarkBuildFQNameConcat3-4           21617942                54.17 ns/op

In summary, the strings.Builder approach is somewhat faster than the original strings.Join approach, but the simple concatenation (which the compiler sees fit to inline) is faster still.

@jkroepke
Copy link
Member Author

In summary, the strings.Builder approach is somewhat faster than the original strings.Join approach

At least, this results matches my results. I had never proposed strings.Builder, if it would be slower than the original one.

Good to know that string concat is faster in specific situations. Thats something what I learned today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants